567 research outputs found
Modeling variable dependencies between characters in Chinese information retrieval
Abstract. Chinese IR can work on words and/or character n-grams. In previous studies, when several types of index are used, independence is usually assumed between them, which obviously is not true in reality. In this paper, we propose a model for Chinese IR that integrates different types of dependency between Chinese characters. The role of a pair of dependent characters in the matching process is variable, depending on the pair’s ability to describe the underlying meaning and to retrieve relevant documents. The weight of the pair is learnt using SVM. Our experiments on TREC and NTCIR Chinese collections show that our model can significantly outperform most existing approaches. The results confirm the necessity to integrate dependent pairs of characters in Chinese IR and to use them according to their possible contribution to IR
A Unifying View of Multiple Kernel Learning
Recent research on multiple kernel learning has lead to a number of
approaches for combining kernels in regularized risk minimization. The proposed
approaches include different formulations of objectives and varying
regularization strategies. In this paper we present a unifying general
optimization criterion for multiple kernel learning and show how existing
formulations are subsumed as special cases. We also derive the criterion's dual
representation, which is suitable for general smooth optimization algorithms.
Finally, we evaluate multiple kernel learning in this framework analytically
using a Rademacher complexity bound on the generalization error and empirically
in a set of experiments
Competing with stationary prediction strategies
In this paper we introduce the class of stationary prediction strategies and
construct a prediction algorithm that asymptotically performs as well as the
best continuous stationary strategy. We make mild compactness assumptions but
no stochastic assumptions about the environment. In particular, no assumption
of stationarity is made about the environment, and the stationarity of the
considered strategies only means that they do not depend explicitly on time; we
argue that it is natural to consider only stationary strategies even for highly
non-stationary environments.Comment: 20 page
A Study of Machine Learning Techniques for Daily Solar Energy Forecasting using Numerical Weather Models
Proceedings of: 8th International Symposium on Intelligent Distributed Computing (IDC'2014). Madrid, September 3-5, 2014Forecasting solar energy is becoming an important issue in the context of renewable energy sources and Machine Learning Algorithms play an important rule in this field. The prediction of solar energy can be addressed as a time series prediction problem using historical data. Also, solar energy forecasting can be derived from numerical weather prediction models (NWP). Our interest is focused on the latter approach.We focus on the problem of predicting solar energy from NWP computed from GEFS, the Global Ensemble Forecast System, which predicts meteorological variables for points in a grid. In this context, it can be useful to know how prediction accuracy improves depending on the number of grid nodes used as input for the machine learning techniques. However, using the variables from a large number of grid nodes can result in many attributes which might degrade the generalization performance of the learning algorithms. In this paper both issues are studied using data supplied by Kaggle for the State of Oklahoma comparing Support Vector Machines and Gradient Boosted Regression. Also, three different feature selection methods have been tested: Linear Correlation, the ReliefF algorithm and, a new method based on local information analysis.Publicad
Improving the Accuracy of a Two-Stage Algorithm in Evolutionary Product Unit Neural Networks for Classification by Means of Feature Selection
This paper introduces a methodology that improves the accuracy
of a two-stage algorithm in evolutionary product unit neural networks
for classification tasks by means of feature selection. A couple
of filters have been taken into consideration to try out the proposal.
The experimentation has been carried out on seven data sets from the
UCI repository that report test mean accuracy error rates about twenty
percent or above with reference classifiers such as C4.5 or 1-NN. The
study includes an overall empirical comparison between the models obtained
with and without feature selection. Also several classifiers have
been tested in order to illustrate the performance of the different filters
considered. The results have been contrasted with nonparametric statistical
tests and show that our proposal significantly improves the test
accuracy of the previous models for the considered data sets. Moreover,
the current proposal is much more efficient than a previous methodology
developed by us; lastly, the reduction percentage in the number of inputs
is above a fifty five, on average.MICYT TIN2007-68084-C02-02MICYT TIN2008-06681-C06-03Junta de Andalucía P08-TIC-374
Generation and matching of ontology data for the semantic web in a peer-to-peer framework
The abundance of ontology data is very crucial to the emerging semantic web. This paper proposes a framework that supports the generation of ontology data in a peer-to-peer environment. It not only enables users to convert existing structured data to ontology data aligned with given ontology schemas, but also allows them to publish new ontology data with ease. Besides ontology data generation, the common issue of data overlapping over the peers is addressed by the process of ontology data matching in the framework. This process helps turn the implicitly related data among the peers caused by overlapping into explicitly interlinked ontology data, which increases the overall quality of the ontology data. To improve matching accuracy, we explore ontology related features in the matching process. Experiments show that adding these features achieves better overall performance than using traditional features only. © Springer-Verlag Berlin Heidelberg 2007
Modeling User Search Behavior for Masquerade Detection
Masquerade attacks are a common security problem that is a consequence of identity theft. This paper extends prior work by modeling user search behavior to detect deviations indicating a masquerade attack. We hypothesize that each individual user knows their own file system well enough to search in a limited, targeted and unique fashion in order to find information germane to their current task. Masqueraders, on the other hand, will likely not know the file system and layout of another user's desktop, and would likely search more extensively and broadly in a manner that is different than the victim user being impersonated. We identify actions linked to search and information access activities, and use them to build user models. The experimental results show that modeling search behavior reliably detects all masqueraders with a very low false positive rate of 1.1%, far better than prior published results. The limited set of features used for search behavior modeling also results in large performance gains over the same modeling techniques that use larger sets of features
- …